We're pretty sure we got almost all of our problems sorted out, boiled down to three "core" issues.
- A large number of orphaned agent data notes (>300) combined with a very large database (> 2 GB) caused the database to be completely unusable. For some reason, this orphaned agent data note problem, compounded by the large size of the database caused "random server issues", such as socket errors, memory errors, 100% processor usage in the compact & update tasks, etc.
- We tried moving the users of that one database onto a newly built server to see if it as a strange hardware or server config issue. The new server crashed every day or two. We eventually determined that we had a bad stick of ram in the server. (doh!)
- After replacing the ram, we had databases going corrupt on a regular basis (about 5 new databases every day). We're still not sure what caused this, but it definately seems to be a hardware issue, probably disk related. The system was likely on it's last legs and when we removed it and moved it to our IT lab for testing, the machine refused to boot and went into and endless cycle of linux disk checks. Yike! Could be a bad cable, bad harddrive, bad disk controller, etc. Luckily this server is no longer in production ;-)
- The linux tweaks in the redbook helped improve the performance of our servers. Yay!